Handling verb phrase morphology in highly inflected Indian languages for Machine Translation
نویسندگان
چکیده
The phrase based systems for machine translation are limited by the phrases that they see during the training. For highly inflected languages, it is uncommon to see all the forms of a word in the parallel corpora used during training. This problem is amplified for verbs in highly inflected languages where the correct form of the word depends on factors like gender, number and tense aspect. We propose a solution to augment the phrase table with all possible forms of a verb for improving the overall accuracy of the MT system. Our system makes use of simple stemmers and easily available monolingual data to generate new phrase table entries that cover the different variations seen for a verb. We report significant gains in BLEU for English to Hindi translation.
منابع مشابه
Bidirectional machine translation in indian languages
This paper, discusses the approach adopted in the development of a bidirectional Machine Translation system for Indian languages. The approach makes use of the characteristics of the languages in simplifying the process of translation. The verbfinal sentence structure and the case-inflected nature of Indian language sentences have led us to adopt a verbcentered approach. The analysis is carried...
متن کاملRich morpho-syntactic descriptors for factored machine translation with highly inflected languages as target
The baseline phrase-based translation approach has limited success on translating between languages with very different syntax and morphology, especially when the translation direction is from a language with fixed word structure to a highly inflected language. There are two main points to improve on: morphological translation equivalence and long range reordering. Translating the correct surfa...
متن کاملEnglish-Latvian SMT: knowledge or data?
In cases when phrase-based statistical machine translation (SMT) is applied to languages with rather free word order and rich morphology, translated texts often are not fluent due to misused inflectional forms and wrong word order between phrases or even inside the phrase. One of possible solutions how to improve translation quality is to apply factored models. The paper presents work on Englis...
متن کاملImproving statistical machine translation by classifying and generalizing inflected verb forms
This paper introduces a rule-based classification of single-word and compound verbs into a statistical machine translation approach. By substituting verb forms by the lemma of their head verb, the data sparseness problem caused by highly-inflected languages can be successfully addressed. On the other hand, the information of seen verb forms can be used to generate new translations for unseen ve...
متن کاملEvaluating Machine Translation Evaluation’s BLEU Metric for English to Hindi Language Machine Translation
Machine Translation Evaluation (MTE) has been widely recognized by the Machine Translation (MT) community. The main objective of MT is to break the language barrier in a multilingual nation like India. Evaluation of MT is required for Indian languages because the same MT is not works in Indian language as in European languages due to the language structure. So, there is a great need to develop ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011